Calculated Columns
The Calculated Columns node enables users to manipulate columns during the data flow. Once the node has been connected to the relevant table, provide a PQL script in order to create or manipulate column values, or perform calculations on existing column values, and add these manipulations as an additional column.
There are two ways to provide a script:
- Write your script in the Calculated Column window
- Write your script in the PQL Editor
To add multiple calculated columns to the same table, add another calculated column node and connect it to the existing one. functions.
Calculated Columns Script Window
Write a PQL expression in the script window (red highlight below):
- Use square brackets to select a column
- PQL scripts are case-sensitive - make sure that column headings are written in the correct case
Under Calculated Column Name provide a name for the new column, and under Calculate Column Type select the relevant column type (green highlight)
PQL Editor
The PQL Editor is a script-writing interface with drag and drop functionality. Click here to learn more.
Users can open the PQL Editor by clicking the Advanced Script button (blue arrow above) in the Calculated Column window.
Watch this video to learn how to create or manipulate column values, or perform calculations on existing column values.
Add a Calculated Column
STEP 1
Add the Calculated Column node to the canvas and connect it to the relevant table.
STEP 2
In the Script window, write your PQL script. Open the PQL editor (red arrow) to write your PQL expression by selecting the required functions, columns, variables, and elements. For details about PQL, click here.
Name your calculated column, and select the column type (green highlight).
STEP 3
Click the to preview.
Examples
Below, the PQL sum function has been applied to create an additional column combining the values from the Cost, Overhead, and Expenses columns:
sum([Cost],[OverHead],[Expenses])
In this example, an If statement is entered into the script window, to generate a new column that will return a result of either High Margin or Low Margin, based on the Margin and Overhead columns. If the margin is greater than 0.3, the calculated column will return High Margin; if it's equal to or less than 0.3, it will return Low Margin:
IF([Margin] > 0.3, "High Margin", "Low Margin")
Case conditional logic enables you to build a decision within the result of the formula. The index of the given criteria should match the index of the outcome. The syntax for a case statement is:
- <critera_set>: insert the set of criteria (columns)
- <outcome_set>: insert the index of outcomes - these are the results that should be returned when one of the criteria is met. The outcome set index should match the criteria set index.
- <default_value>: insert the default value that will be returned when the given criteria are not met
In this example, a column is added with the number of cars for each customer, based on the following statement:
case(criterias([Cars]=0,[Cars]=1,[Cars]=2), dataSet("no car", "single", "two"),"na")
If the number of cars is 0, the statement will return the string "no car". If the number of cars is 1, "single" will be returned, and if it's 2, "two" will be returned. If the number of cars is not equal to 0, 1, or 2, then "na" will be returned.
'Case' Conditional Logic 2
In this example, the statement will return a column with the following strings: "young couple without kids," "young couple with kids," "near empty nesters," or "na." The results returned will depend on combinations of the following criteria: age, number of children, and marital status.
- Where age < 30 and children = 0 and marital status = married, "young couple without kids" is returned
- Where age < 30 and children > 0 and marital status = married, "young couple with kids" is returned
- Where age > 64 and children > 0 and marital status = married, "near empty nesters" is returned
- Where none of the above criteria sets match the customer profile, the default value of "na" is returned.
In PQL the statement is:
case(criterias(([Age]<30) && ([Children]=0) && ([Marital Status]="Married"),([Age]<30) && ([Children]>0) && ([Marital Status]="Married"),([Age]>64) && ([Children]=0) && ([Marital Status]="Married")), dataSet("young couple without kids", "young couple with kids", "near empty nesters"),"na")
Note that in this example, each criterion in the <criteria_set> contains 3 criteria. Enclose each criterion in brackets (), and use && to join each set of 3 criteria as one criterion within the <criteria_set>. As usual, use a comma as the delimiter between each of the criteria in the <criteria_set>, i.e.
([Age]<30) && ([Children]=0) && ([Marital Status]="Married"),